![]() METHOD AND DEVICE FOR DYNAMICALLY MANAGING THE MESSAGE RETRANSMISSION DELAY ON AN INTERCONNECTION NE
专利摘要:
A network interface controller for dynamically managing a retransmission delay of a message within an interconnect network, wherein the controller is capable of returning a message if a message retransmission delay is exceeded and characterized in that it comprises: - a communication module able to receive a message transmission instruction, said message transmission instruction comprising characteristic data of the message, - at least one buffer memory of transmission capable of storing at least a portion of the characteristic data of the message and adapted to associate it with a retransmission delay of the message, - a slowing definition module, said slowing definition module being able to define a value of the transmission factor. dividing from said characteristic data of the message, - a reference clock capable of generating a fixed frequency signal, - at least one frequency divider, said frequency divider being adapted to generate a signal at a reduced frequency from the value of the division factor and the fixed frequency signal from the reference clock, at least one reduced frequency clock, said reduced-frequency clock being associated with the transmission buffer memory, said reduced-frequency clock being able to make it possible to time the retransmission delay from the signal at a reduced frequency and being able to trigger a retransmission of the message if the delay of retransmission is exceeded. 公开号:FR3072237A1 申请号:FR1759459 申请日:2017-10-10 公开日:2019-04-12 发明作者:Ghassan Chehaibar;Jean-Luc Velut 申请人:Bull SA; IPC主号:
专利说明:
METHOD AND DEVICE FOR THE DYNAMIC MANAGEMENT OF THE MESSAGE RETRANSMISSION DELAY ON AN INTERCONNECTION NETWORK The invention relates to the field of the management of the end-to-end transmission of messages on a network, and more particularly of the retransmission messages on an interconnection network. The invention relates to a computer device for the dynamic management of the end-to-end transmission of messages in an interconnection network. The invention also relates to a method for dynamic management of the delay for retransmission of a message within an interconnection network in order to compensate for a possible loss of message. [Prior Art [0002] In computer networks and telecommunications, computer devices send messages to each other and often have to wait for a response, or an acknowledgment, before continuing. Indeed, these messages can be lost, duplicated or corrupted and to compensate for the loss of messages, it is common for each message to receive an acknowledgment. For example, when a message is sent, if after a certain time the acknowledgment is not received then the system assumes that the message is lost and the message is then retransmitted. To avoid waiting indefinitely, computing devices can include a mechanism for defining a retransmission delay that controls the triggering of an action. For example, if no response has been received for the duration of a retransmission delay, the message may be resent or the connection may be closed. The time between receipt of the message sending instruction and retransmission, also called retransmission delay, is often based on the time required to receive a response (RTT for RoundTrip Time in English terminology), also corresponding to the duration sending and receiving acknowledgment of a message. High performance computing, also called HPC (High Performance Computing in English terminology) is developing for university research as for industry, especially in technical fields such as aeronautics, energy, climatology and life sciences. These calculations are generally implemented on interconnection networks called clusters. The objective of these clusters is to exceed the [0557-BULL16] limits of the existing hardware by pooling resources to allow the parallel execution of instructions and the aggregation of memory and disk capacity. A cluster, also called an interconnection network, is a set of computing means (also called nodes) interconnected and exchanging messages to perform common operations. Clusters are made up of a large number of nodes (typically several thousand), the latter being interconnected, for example hierarchically, by switches (typically several hundreds), called switches in English terminology. The nodes correspond to a set of computer equipment which can contain computing or / and storage capacities and are for example a server, a computer or more generally a machine. However, the size of the clusters increasing day by day, these clusters must be able to process more messages at higher speeds. In general, the duration of message transmission has decreased, but it remains variable depending on network conditions in particular. This creates a challenge for the implementation of a retransmission delay management mechanism on an interconnection network which can manage message losses while minimizing the number and duration of exchanges. In the context of the development of interconnection networks, it is desirable to have low latency and rapid management of transmission errors. The general problem is then to correctly set the value of the retransmission delay so as not to retransmit messages too often or not fast enough. In addition, it would be desirable to be able to integrate the taking into account of the size of the messages in the value of the retransmission delay. Traditional timing mechanisms are based on retransmission times implemented in software. Software solutions allow adaptive retransmission times to be achieved depending on parameters. Thus, the retransmission time corresponds to the current time to which a waiting time is added and the value of the waiting time (RTO - retransmit timeout in English terminology) varies according to the messages, generally divided into packets of identical length. , each packet having its own acknowledgment. For example, some documents propose an RTO equal to the RTT then for each acknowledgment received, the RTT is calculated and a new RTO is calculated, the new RTO being a function of the sequence of the last RTTs. This type of solution is implemented in the software layers of transport protocols. However, this kind of calculation cannot be done simply in a network card via a hardware implementation. In addition, the implementations of such software require great computing power, especially in the case of an interconnection network. This is particularly the case when using [0557-BULL16] applications hosted on a node where there is on the one hand consumption of part of the node's resources and on the other hand dependence on the resources available on the said node for managing the retransmission of messages. This can cause malfunctions in the interconnection network. In recent years, the use of network cards has been developed which can take care of at least part of the transmission / retransmission of messages. Thus, to alleviate the problems of consumption of computing resources, certain interconnection networks may present nodes equipped with network cards comprising several transmission and reception queues. Moving some of these functions onto dedicated hardware frees up the computing power of the node for other tasks. Thus, the retransmission is managed by a hardware device on a network card without software intervention. However, messages are sent in an order fixed by the software layer which uses the card (the card cannot reorder) and it is not possible to vary, via a hardware implementation, the timeout so as to properly set the retransmission deadline so as not to retransmit messages too often or not fast enough. Conventionally, each message is added to the transmission list associated with a retransmission time T such that T = waiting time + T of the last message in the list. A drawback of such methods is that they cannot take into account the characteristics of the messages when the latter can influence the RTT or that they cannot take into account the state of the network which can change at any time during the processing of the message. message. [0008] Thus, there is a need for new devices and methods for the dynamic management of message retransmission in an interconnection network where said management is carried out, via a hardware implementation, so as to respond to the problems caused by existing methods. [Technical problem [0009] The invention therefore aims to remedy the drawbacks of the prior art. In particular, the object of the invention is to propose a device for dynamic management of message retransmission, said device being able in particular to allow, via a hardware implementation, to propose an adaptive retransmission capable of varying according to the characteristics of messages such as the length of the messages to be transmitted or to take into account the congestion of the network as well as the times of processing of the messages at the source and at the destination. The device should allow this management from a simple hardware implementation and without requiring complex calculations consuming resources. The invention further aims to propose a method for dynamic management of message retransmission, said method allowing adaptive retransmission capable of varying in particular as a function of the characteristics of the messages without requiring complex calculations consuming resources. [Brief description of the invention] To this end, the invention relates to a network interface controller for the dynamic management of a delay for retransmission of a message within an interconnection network comprising a plurality of nodes, said network interface controller being capable of returning a message if a message retransmission time is exceeded, said network interface controller being characterized in that it comprises: a communication module capable of receiving, from a source node, an instruction for transmitting a message to a target node, said instruction for transmitting a message comprising data characteristic of the message, at least one transmission buffer memory capable of storing at least part of the characteristic data of the message and capable of associating it with a delay for retransmission of the message, a deceleration definition module, said deceleration definition module being capable of define a value of the division factor from said characteristic data of the message, - a reference clock capable of generating a fixed frequency signal, - at least one frequency divider, said frequency divider being able to generate a signal at a reduced frequency from the value of the division factor and from the fixed frequency signal from the reference clock, at least one clock with reduced frequency, said clock with reduced frequency being associated with the transmission buffer memory, said clock with reduced frequency being capable of making it possible to time the delay of retransmission from the signal at a reduced frequency and being able to trigger a retransmission of the message if the retransmission time is exceeded. [0557-BULL16] Thus, the present invention makes it possible to dynamically manage the delay for retransmission of a message within an interconnection network and this via a hardware implementation carried by a network interface controller. . This network interface controller makes it possible, from a frequency divider, to vary the frequency of the clock counting down the time at one or more transmission buffer memories and therefore makes it possible to vary the retransmission time . Thus, instead of calculating a different waiting time for each message, it is the passage of time which is modified according to the characteristics of the message and of the messages being transmitted. That is, the speed of the reduced frequency clock may vary depending on the characteristics of the messages in progress, thus the reduced frequency clock is a variable reduced frequency clock. The higher the division factor, the more the frequency decreases and therefore there is a slower flow of time as controlled by the clock at reduced frequency. Such a device can for example be used to manage the time of retransmission of a message according to the characteristics of the message to be sent. In addition, the controller is able, via hardware encoding, to take full charge of the management of message transmissions and retransmissions. Thanks to this, the nodes can devote themselves exclusively to computation, while the communications are managed independently by the network interface controller. Thus, unlike other common network technologies, the network interface controller can provide very high throughput even when the system carries a heavy computational load. According to other optional characteristics of the controller: - when the message has been transmitted by the network interface controller, the deceleration definition module is configured to modify the value of the division factor. Thus, the value of the division factor is dynamic and depends in particular on the messages in progress in the list and their status. Each message contributes according to its state (being transmitted, transmitted) and its characteristics to the definition of the value of the division factor. the controller is capable of reading data to be transmitted directly from a memory of the source node and the deceleration definition module is configured to take into account a reading time of the data to be transmitted when defining the value of the division factor . The ability of the network interface controller to read at least part of the content of messages in the memory of the source node to send them to [0557-BULL16] the target node can advantageously reduce the consumption of resources at the source node . - The controller comprises at least eight transmission buffer memories each being associated with a frequency divider and a clock at reduced frequency. This allows the different transmission buffer memories to be able to independently manage the retransmission delay of all the messages to be transmitted for which they have management. Thus, it is possible to more finely adapt the delay for retransmission of messages recorded in each of the transmission buffer memories. Preferably, the controller has at least sixteen transmission buffer memories. - the characteristic message data used by the deceleration definition module to define the value of the division factor include: - the size of the message, for example measured in bytes, - the presence of data to be transmitted in the instruction, - the presence of a zone start address to be read for the source node and / or the target node, the presence of a descriptor table address, said descriptor table comprising, preferably when the data to be transmitted is in a split area, the start address and the length of each fragment on the source node and / or the target node , - the presence of virtual addressing for the target node, and / or - the subject of the message, whether it is a request or a response. These different characteristics will have an influence on the processing and / or routing time of the message and it is therefore advantageous that these characteristics are taken into account when defining the value of the division factor. Preferably, the characteristic data or include at least four of these elements and more preferably the characteristic data or include all of these elements. Preferably, the characteristic message data or data used to define the value of the division factor comprises the size of the message, for example measured in bytes. Thus, the controller makes it possible to take into account the length of the messages to be transmitted within the retransmission delay. [0557-BULL16] - the value of the division factor is also calculated from values characteristic of the network. Preferably, the characteristic values of the network are associated with the occupancy rate / availability of the network. The network being composed of a plurality of channels which can each be composed of several sections, the occupancy / availability rate of the network can be the occupancy / availability rate of a channel or of a section of the channel. This makes it possible to take into account, for the establishment of the division factor and therefore of the time of retransmission, the state of congestion of the network which can occur at any time during the processing of the message and which does not depend on the characteristics. of the message. In addition, this taking into account can be made possible via the hardware implementation of the network interface controller according to the invention and without requiring complex calculations, in particular thanks to the slowdown definition module. the value of the division factor is further calculated from a constant, the value of said constant being such that a signal at a reduced frequency generated from the value of said constant and from the signal at fixed frequency originating from the reference clock, would correspond to the time necessary for a message to make a return trip on the interconnection network with no load. This establishes a base duration that can then be changed by message and network characteristics. the controller is configured so that when the message has been transmitted, the characteristic data of the message is modified and a new division factor is calculated by the deceleration definition module, said new division factor being calculated from the old division factor and new data characteristic of the message transmitted. Thus, the division factor is constantly changing as a function of the load of each transmission buffer and of the progress of delivery of the messages. the deceleration definition module is configured to define a new value of the division factor when the acknowledgment of the message has been received by the network interface controller. More particularly, the value of the division factor is modified so that a message whose acknowledgment has been received no longer affects the value of the division factor. Thus, even more preferably, the deceleration definition module is configured to cancel the factor specific to the message when the acknowledgment of the message has been received by the network interface controller. Indeed, the message is finished, so it no longer has any influence on the clock frequency at [0557-BULL16] reduced frequency. Advantageously, considering that many messages are sent simultaneously, the acknowledgment of a message does not modify the factors of the other messages in progress. the controller is configured so that if an instruction for transmitting a new message is received by the communication module, that said instruction includes data characteristic of the new message and that part of the data characteristic of the new message is recorded in the transmission buffer memory, then a new value of the division factor is calculated, said new value of the division factor being calculated, by the deceleration definition module, from the old division factor and said characteristic data of the new message. Thus, the value of the division factor increases with the arrival of new messages. More particularly, the value is modified if the new message is managed by the same transmission buffer as the first message. - the retransmission time is calculated from the current time as given by the reduced frequency clock (s) and a constant waiting time. This makes it possible to keep in memory the transmission or retransmission time of the message with a minimum consumption of resources. According to another aspect, the present invention relates to a method for dynamic management, by a network interface controller, of a delay for retransmission of a message, within an interconnection network comprising a plurality nodes, so as to return a message if a message retransmission time is exceeded, said network interface controller comprising a communication module, a deceleration definition module, a reference clock, at least one transmission buffer memory , at least one frequency divider and at least one clock at reduced frequency, said method comprising the following steps: - reception from a source node, by the communication module, of an instruction for transmitting a message to a target node, said instruction for transmitting a message comprising data characteristic of the message, - storage, in the transmission buffer, of at least part of the characteristic data of the message and association of said part of the characteristic data of the message with a delay for retransmission of the message, - definition, by the deceleration definition module, of a value of the division factor from said characteristic data of the message; Preferably, at [0557-BULL16] at least one of said characteristic data of the message is used for the definition of the value of the division factor, - generation, by the frequency divider, of a reduced frequency signal from the value of the division factor and of a fixed frequency signal from the reference clock, and - counting down, by the reduced frequency clock associated with the transmission buffer, of the retransmission delay at a frequency equal to the frequency of the reduced frequency signal and - triggering of a retransmission of the message if the retransmission time is exceeded. This method makes it possible, from a frequency divider, to vary the frequency of the clock counting down time at the level of the transmission buffer memory and therefore makes it possible to vary the moment of retransmission of the message. Such a method can for example be used to vary the time of retransmission as a function of the characteristics of the message to be sent. In addition, since this does not call upon the computing capacities of a node, the method does not have a negative impact on the availability of resources for the calculations. According to yet another aspect, the invention relates to a non-transient computer readable medium (or of mass memory type) recorded with instructions which, when executed by one or more processors, bring the one or more processors to execute the method according to the invention. Other advantages and characteristics of the invention will appear on reading the following description given by way of illustrative and nonlimiting example, with reference to the appended figures which represent: • Figure 1, a diagram of an interconnection network comprising a plurality of nodes, each node being associated with a network interface controller according to the invention; • Figure 2, a detailed schematic diagram of a node and the network interface controller according to the associated invention; • Figure 3, a schematic diagram of message exchanges over time between a source node and target nodes; [0557-BULL16] • Figures 4A to 4D, detailed schematic diagrams of a network interface controller comprising according to the invention two transmission buffer memories. The different figures presenting the content of the transmission buffer memories at different times (t a , tb, t c and td) with reference to FIG. 3; • Figure 5, a schematic representation of the process for dynamic management of a retransmission delay according to the invention; • Figure 6, a flow diagram of a dynamic management of a retransmission delay according to the invention; • Figure 7, an illustration of fixed and reduced frequencies according to the invention; • Figure 8, a schematic representation of the countdown of the retransmission time by a reduced frequency clock according to the invention. [Description of the inventionl] In the following description, the term "retransmission time" means the time or time until which a message can wait for the acknowledgment or acknowledgment before to be considered lost. The retransmission delay can correspond to the current time to which a waiting delay is added. The term "waiting period" means within the meaning of the invention, a period corresponding to the period between the handling of a message and the time when an acknowledgment should be received. The term "resend" means within the meaning of the invention that a system attempts to resend a message after the message has been considered lost due to too long a waiting period, that is to say - say greater than the retransmission time, or a negative acknowledgment. There may be a limit to the number of times a message can be resent. For example, the number of repetitions can be limited to a maximum of six attempts. Here, the term "interconnection network" means any dedicated computing network (such as an InfiniBand network), or more generally any collection of computer elements, in particular distributed processors, with communication links physical to each other. In the following description, the expression “waiting list” or “transmission buffer memory” means a memory in which the list of messages being transmitted or transmitted for which is recorded is recorded. an acknowledgment or acknowledgment has not yet been received. The messages are generally put in a waiting list or transmission buffer according to the order of their processing and are based on a FIFO system (first in, first out in English terminology). Advantageously, a network interface controller according to the invention comprises several waiting lists in the form of virtual containers also called transmission buffer memories. These lists can for example be assigned to a node group or to a message type. By "acknowledgment" is meant, within the meaning of the invention, a message sent by a target node following the reception of a message. Advantageously, there is only one acknowledgment per message sent. By "coupled", in the sense of the invention, connected, directly or indirectly with one or more intermediate elements. Two elements can be coupled mechanically, electrically or linked by a communication channel. By "processor", within the meaning of the invention, at least one hardware circuit configured to execute operations according to instructions contained in a code. The hardware circuit can be an integrated circuit. Examples of a processor include, but are not limited to, a central processing unit, a graphics processor, an application specific integrated circuit (ASIC) and a programmable logic circuit. By "executable operation", within the meaning of the invention, an action performed by a device or a processor unless the context indicates otherwise. Examples of executable operations include, but are not limited to, "process", "calculate", "determine", "display", "compare" or the like. In this regard, operations relate to actions and / or processes of a data processing system, for example a computer system or similar electronic computing device, which manipulates and transforms the data represented as physical quantities ( in the memories of the computer system or other devices for storing, transmitting or displaying information. In the following description, the same references are used to designate the same elements. Interconnected networks generally use a short period of time for recovering message errors and quickly re-send failed shipments. Typically, in an interconnect network, the total delay time is short, for example, between 50 and 500 milliseconds. However, all the messages do not require the same duration to be transmitted to the target node, so a problem occurs for messages for which the duration of the retransmission delay is too short. In an interconnect network, the failure of a connection path results in the failure of an application that uses the failed connection path, which can lead to performance degradation. Figure 1 shows schematically an interconnection network 1 comprising a plurality of nodes 200, 201, 202, 203. The nodes 200 of the interconnection network 1 may include a communication module 210, one or more processors 220 and the memory 230. Each of the nodes is associated with a network interface controller 100, 101, 102, 103 able to send a message 300. In addition, this network interface controller 100 is able to send a message 300 if a delay retransmission 20 of message 300 is exceeded. As detailed in Figure 2, the network interface controller 100 can be a separate device or coupled with a node 200. The network interface controller 100 can be integrated into a node 200 and share resources with the node 200, such as processors 220 and memory 230. Preferably and as illustrated, the network interface controller 100 can take the form of a device (eg a network card) comprising its own processors 160 and memory 170 ( for example, PCIe card, USB device or daughter card). The network interface controller is advantageously configured to be able to construct messages to be sent by reading their data from the memory of a source node, transmit them, track messages in progress, receive messages from other nodes and generate responses (eg data read or write message). Thanks to this and to the other characteristics detailed here, it is allows dynamic and efficient management of the end-to-end transport with retransmission in case of loss of message. The network interface controller 100 can comprise a combination of hardware and software (for example, firmware) to provide a node 200 with physical access to a communication channel of the interconnection network 1. The communication channel communication can be based on a wired or wireless connection. Each network interface controller 100 can comprise one or more processors 160 which can advantageously be integrated circuits of the ASIC (application-specific integrated circuit in English terminology) type [0557-BULL16] and be coupled to communication libraries such as MPI (Message Passing Interface in Anglo-Saxon terminology) and PGAS (Partitioned Global Address Space in Anglo-Saxon terminology). Processors 160 can be coupled to memory 170, which may be able to store computer-readable instructions and data. The memory 170 of the controller 100 can store information accessible by the processors 160 comprising instructions which can be executed by the processors 160. For example, the memory 170 can include data which can be retrieved, manipulated or stored by processors 160. memory 170 and the other memories described here can be any type of storage capable of storing information accessible by the processor concerned, such as a hard disk drive, a solid state drive, a memory card, random access memory or read only memory . The network interface controller 100 includes a communication module 110 capable of receiving, from a source node 201, an instruction 211 for transmitting a message. The instructions 211 can be a set of instructions to be executed by the processors 160. The instructions can be stored in an object code format for immediate processing by a processor, or in another language supported by the interface controller network, including scripts. The transmission instructions 211 generally comprise characteristic data 350 of the message. This characteristic data 350 can be retrieved, stored or modified by the network interface controller 100 in accordance with the instructions 211. The data can include sufficient information to identify the relevant information in a memory of a node, such as numbers , identifiers, descriptive text, property codes, references to data stored in other memories or data that is used by a function to calculate the relevant data. The instructions will contain different data depending on their purpose. Typically, an instruction 211 for a data writing message 311 may include the following data: - The length of the data to be read in the memory of the source node, or the size of the message, for example measured in bytes - The position of the data to be read. For example, the data can be in a contiguous area of memory or in a spanned area. If it is a contiguous zone, the instruction 211 will include a starting address of the zone whereas if it is a fragmented zone, the instruction 211 will include the address of a descriptor array each element of which contains an address of beginning of a fragment and its length [0557-BULL16] (such a table is called IOVEC for vectored I / O in English terminology), - The type of addressing in memory. Indeed, the addressing can be physical or virtual and if it is virtual there is a risk of making page faults. - An identifier for the target node and possibly a particular port in this node. For example, the instruction can include identifiers of type LID (Local Identifier in English terminology) or identifiers of type GID (Global Identifier in English terminology) composed of a prefix and a port identifier GUID ( Global unique identifiers in Anglo-Saxon terminology), The position to write the data to the target node. As before, it can be in a contiguous or fragmented area. Alternatively, an instruction for a data read message 321 may include the following data: - The length of the data to be read from the memory of the target node, - The position of the data to be read in the target node, As before, it can be in a contiguous or fragmented area. - The type of addressing in memory. Indeed, the addressing can be physical or virtual and if it is virtual there is a risk of making page faults. - The position where to write the data in the source node, as previously, it can be in a contiguous or fragmented area. - An identifier for the target node similar to a data write message. A message 300 (for example, 311, 312, 321, 322, 331, 341, 351) can be any message capable of, or configured to be, sent over a communication channel of the interconnection network. Messages can, for example, include data or a request for data hosted by a target node. The messages can also include a request for application data, that is to say data to be generated by the target node. Messages 300 can be formatted to support a proprietary communication protocol or an industry standard protocol (for example, IP, TCP, UDP). In addition, the controller 100 is able to send the message with a single end of message signal. In addition, the controller 100 is advantageously capable of receiving a single acknowledgment regardless of the size of the message. This saves network bandwidth by having a single response for the entire message instead of a plurality of acknowledgments corresponding to an acknowledgment for each packet of the message. Thus, the messages 300 to be sent are divided into packets sent in an Indian file where only the [0557-BULL16] first packet includes a routing header. For example, advantageously, only the first packet of the message 300 sent by the controller 100 has a routing header. Indeed, the controller 100 is able to transmit the packets exclusively via a dedicated path between the source and the target following the reception by the target of the first packet. Furthermore, said dedicated path is closed after the target controller 100 has received the last packet. Such characteristics, relating to two layers of transmission protocols: the application level (message) and the transport level (packet), make it possible to optimize network bandwidth and in particular to reduce the consumption of network bandwidth. FIG. 3 represents a succession of message transmission over time between a source node 201 and two target nodes 202, 203. The message 311 is a data writing message, aiming to write to a memory of the node 202 of the data contained in a memory of the node 201. Thus, it is a question of reading 311 has data in the memory of the source node 201, transfer them to the target node 202 and write them 311 b in the memory of the target node 202. When the writing is finalized, the node 202 sends, via its network interface controller 102, a completion response message 312 to the source node 201. On reception of the completion message 312, the source node 201 sends an acknowledgment of receipt 313 to the source node 202. As shown in FIG. 3, in parallel with this transmission of the data writing message 311, the source node 201 has sent a data reading message 321 to the same target node 202. It is a question of reading 322a data in the memory of the target node 202, transfer them to the source node 201 and write them 322b in the memory of the source node 201. Upon reception of the response message 322 from the target node 202 to the source node 201 and after writing data, the source node 201 sends an acknowledgment 323 to the target node 202. FIG. 3 also shows the sending of several messages 331, 341, 351 by the source node 201 to another target node 203. According to one embodiment, the network interface controller 100 comprises at least one transmission buffer memory 120 capable of storing network data such as part of the messages to be transmitted. The network controller also includes a deceleration definition module 130 capable of defining a value 30 of the division factor from said characteristic data 350 of a message to be transmitted. [0557-BULL16] The network controller also includes at least one frequency divider 140. The network controller also includes a reference clock 40 which preferably can provide precision on the order of a microsecond or less. Finally, the network controller also includes at least one reduced frequency clock 150. The invention is based in particular on the fact that instead of calculating a waiting time per message, the network interface controller uses a fixed waiting time but the clock speed is adapted to messages in transmission course. This clock speed is notably generated by the at least one reduced frequency clock 150. The network interface controller 100 includes a reference clock 40 generating a signal at a fixed frequency. Advantageously, the network interface controller 100 comprises a reduced frequency clock 150 per transmission buffer, or list of messages in progress, the frequency of which is derived from the frequency of the reference clock and from a value of the division factor. The calculation can for example take the following form: reduced frequency clock frequency = reference clock frequency / division factor value. The division factor being able to be modified as a function of time, the reduced frequency clock 150 is also a variable frequency reduced frequency clock 150. FIGS. 4 represent the network interface controller and more particularly the content of the transmission buffer memories 120 as a function of the progression over time (FIG. 4A: t a , FIG. 4B: tb, FIG. 4C: t c , Figure 4D: td). As illustrated in FIG. 4A, the network interface controller 100 in an arrangement according to the invention can comprise two transmission buffer memories 121, 122, each being coupled to a reduced frequency clock 151, 152 and a frequency divider 141 , 142. In Figure 4A, the first transmission buffer 121, dedicated to the node 201, comprises at least part of the characteristic data of the message 311 associated with a delay for retransmission of the message. In Figure 4B, the first transmission buffer 121, dedicated to the node 201, comprises at least part of the characteristic data of the message 311 associated with a delay for retransmission of the message 311 as well as at least part of the data characteristics of message 321 associated with a delay for retransmission of message 321. In FIG. 4C, the first transmission buffer 121, dedicated to node 201, includes the same data as in FIG. 4B while the second buffer [0557-BULL16] 122 for transmission, dedicated to node 202 comprises at least part of the characteristic data of message 331 associated with a delay for retransmission of message 331. In Figure 4D, the first transmission buffer 121, dedicated to the node 201, comprises only, at least part of the characteristic data of the message 311 associated with a delay for retransmission of the message 311. Indeed, the response 322 having been received, at least part of the characteristic data of the message 321 has been deleted from the transmission buffer 121. The second transmission buffer 122, dedicated to node 202, for its part comprises at least part of the data characteristic of messages 331, 341 and 351 associated with a delay for retransmission of messages 331, 341, and 351. The messages remain in their list of messages in progress, or transmission buffer until receipt of their respective responses. When the reduced frequency clock associated with said buffer memory reaches a time greater than the retransmission delay, without having received the response, the message is retransmitted, it benefits from a new timestamp equal to the current time of the reduced frequency clock 150 and it is placed at the bottom of the list of messages in progress. Since each transmission buffer is associated with a reduced frequency clock, the network interface controller 100 is able to manage the retransmission delays in a differential manner between the transmission buffer memories. Thus, it is possible that the times for retransmission of messages recorded in a transmission buffer memory are much shorter and this depends on the messages transmitted but also on the characteristics of the target nodes. FIG. 5 is a schematic representation of an example of a dynamic management method 500, of a delay for retransmission of a message according to the invention. Figure 6, details this process by proposing a flowchart of a dynamic management of a retransmission delay according to the invention. The method begins with a step 510 of reception from a node of an instruction 211 for transmitting a message 311 for writing data, said instruction 211 for transmitting a message comprising 350 characteristic data of message 311. Following this reception, the network interface controller 100, stores 520 at least part of the data 350 characteristics of the message 311 [0557-BULL16] and associates from said part of the data 350 characteristics of the message to a retransmission delay 20 of the message. This retransmission delay 20 of the message can be composed of an indication of the current time which is added to a fixed waiting time. In addition, preferably, each time a message is stored in a transmission buffer memory, it is time-stamped, that is to say that it is recorded in association with the current value the reference clock 40 . Then, the network interface controller 100, via a slowdown definition module 130, proceeds to a definition 530 of a value of the division factor 30 from said data 350 characteristics of the message 311. As shown in FIG. 6, this determination can be made from said data 350 characteristics of the message 300 in combination with congestion data 351 network and a constant value 352. Thus, the time of retransmission of a message will depend on the characteristics of the message but also on information about the current state of the network. For example, in some cases, sending the message may or may not require reading data from the source node. If so, the timing of the retransmission should depend on the length of the data but also on other factors such as the areas of data distribution. For this, the message parameters are taken into account for the determination of the value of the division factor. Likewise, the time of retransmission of a message can be influenced by the actions or calculations to be performed on the target node. In addition, the propagation time of the message from the source node to the target node must advantageously be taken into account. Indeed, it can vary greatly depending on network congestion. Finally, in some cases, the response to the message may itself be subject to a retransmission period. For example, if a message waits for a response which is itself retransmitted if the retransmission timeout is exceeded, more time must be allowed before retransmitting the message. More particularly, several characteristic message data can be used to define the value 30 of the division factor: - the size of the message, for example measured in bytes, - the presence of data to be transmitted in the instruction, this is the case for example during programmed Input / Output, where the instruction already contains the data to be transmitted [0557-BULL16] which are small (Programmed input / output in English terminology) - the presence of a zone start address to be read for the source node and / or the target node. Indeed, in this case, the value of the division factor will take into account the access rate to the memory of the source node. - the presence of a descriptor table address comprising, when the data to be transmitted is in a split area, the start address and the length of each fragment for the source node and / or the target node. In fact, in the case of a split zone, the value of the division factor will take into account the slowdown due to the reading of the descriptor table (IOVEC Vectored I / O in English terminology) and the average size of a segment such as defined in the descriptor table. - the presence of virtual addressing for the target node. Indeed, in the case of virtual addressing, the value of the division factor will take into account the slowdown due to page faults, preferably corresponding to the time required to resolve a page fault as well as the probability of making one. and the size of a page (preferably a power of 2). - the subject of the message: request or response. Indeed, in the case of a request, the value of the division factor will take into account a slowdown to be applied considering the average processing time of a request compared to a response. Such latitude allows in a simple way, given that there is action only on a parameter (the value of the division factor 30) to finely adapt the retransmission time to the characteristics of the message such as the size of the message to send. Advantageously, the characteristic message data or data used to define the division factor include the presence of a zone start address to be read for the source node and / or the target node, and the presence of an address descriptor table comprising, when the data to be transmitted is in a split area, the start address and the length of each fragment for the source node and / or the target node. Thus, it is possible to take into account for the establishment of the division factor and therefore of the retransmission moment, the processing time at the target node or the source node. This via a hardware implementation and without requiring complex calculations. Once the value of the division factor 30 has been determined, the method comprises a step 540 of generating a reduced frequency signal 42a from the value of the division factor 30 and a fixed frequency signal from the reference clock 40. This signal at a reduced frequency 42a is used by a reduced frequency clock associated with the transmission buffer memory in which at least part of the characteristic data of the message has been recorded. It allows the retransmission delay 20 to be counted 550 at a frequency equal to the frequency of the reduced frequency signal coming from the frequency divider 140. Then, the message 311 being a data writing message, the network interface controller 100 proceeds to a verification step 551 that the message 311 has been transmitted. If the message has not been transmitted, then the network interface controller 100 proceeds to a verification step 555 that the retransmission time has not been exceeded. If the retransmission time has been exceeded then the message is sent to the bottom of the list for the triggering 560 of a retransmission of the message. If the retransmission time has not been exceeded then the count 550 continues. When the message 311 has been transmitted by the network interface controller, the deceleration definition module 130 advantageously proceeds to a modification 570 of the value of the division factor. Thus, the value of the division factor is dynamic and depends in particular on the messages in progress in the list and their status. Each message contributes according to its state (being transmitted, transmitted) and its characteristics to the definition of the value of the division factor. Once the new value of the division factor 31 has been determined, the method comprises a step 571 of generating a second reduced frequency signal 42b from the new value of the division factor 31 and a frequency signal fixed from the reference clock. Then, a count 572 equivalent to the count 550 of the retransmission delay 20 at a frequency equal to the frequency of the new reduced frequency signal 42b is started. By way of illustration, FIG. 7 represents the reference frequency 41, the first reduced frequency 42a and the second reduced frequency 42b. The network interface controller 100 then proceeds to a verification step 573 that the response to the message 311 has been received. If the response has not been received, then the network interface controller 100 performs a verification step 575 that the retransmission time has not been exceeded. If the retransmission time has been exceeded then the message is sent to the bottom of the list for a new transmission. If the retransmission time has not been exceeded then the count 572 continues. When the response has been received by the network interface controller 100, the network interface controller sends 580 an acknowledgment or acknowledgment message to the target node and deleting 585 the message data from the transmit buffer. Then, the deceleration definition module 130 advantageously proceeds to a step of modification 590 of the value of the division factor. During this step, the value of the division factor is reduced. Thus, once again, the value of the division factor is dynamic and depends in particular on the messages in progress in the list and on their status. In addition, the controller can advantageously be configured so that if the source node 201 waits for an application response then the network interface controller 100 is able to receive from the target node 202 a reception acknowledgment message. including the application response. In this case, the generation time of the application response is taken into account when defining the value of the division factor. This allows optimization in the number of messages. Indeed, if a message awaits a response with application data, the acknowledgment of receipt is merged with the application response. The value of the division factor takes this into account so that the retransmission delay does not only depend on the network but also on the processing of the message at the destination node. FIG. 8 is a schematic representation of a step for counting the retransmission delay by a clock at reduced frequency according to the invention. According to one embodiment, this step can be subdivided into several sub-steps. Step 610 corresponds to the resetting of a counter managed by a frequency divider 140 according to the invention. Step 620 corresponds to the definition of the value of the division factor 30. This step is equivalent to step 530 described above. Step 630 corresponds to the reception, by the frequency divider 140, of a signal from the reference clock 40 corresponding to the fixed frequency signal 41. The reception of this signal causes the counter 640 to increment 640 managed by the frequency divider 140. The network interface controller then proceeds to a step 650 of comparing the value of the counter managed by the frequency divider 140 to the value of the division factor 30. If the value of the counter managed by the frequency divider 140 is greater than or equal to the value of the division factor 30 then, the frequency divider 140 transmits 660 a signal to [0557-BULL16] a reduced frequency clock 150 according to the invention then the counter managed by the frequency divider 140 is reset 610 to zero. The reception by the reduced frequency clock 150 of the signal emitted by the frequency divider 140 makes it possible to produce the reduced frequency 42. The value of the division factor is dynamic. In the examples below, it is noted F msg and is dependent on the characteristic data of the messages 300 in progress in the list (or transmission buffer). Each message contributes according to its state (being transmitted, transmitted) and its characteristics. The modification of Fmsg according to the status of messages can be written, as an example, according to the code illustrated in the table below. Table 1. Illustration for calculating the value of the division factor. Message writing data During transmission: F msg = F_msg + (L / page_size) * mem_f + (iovec Iovec_f: 0) * L / iovec_size + (virtual Pf_f: 0) * L / (page_size) + req f. Transmission completed F msg = F_msg - (L / page_size) * mem_f. Response received F msg = F-msg - (iovec Iovec_f: 0) * L / iovec_size (virtual Pf f: 0) * L / (page size) - req f. Message reading data During transmission: Fmsg = F_msg + req_f. Transmission completed F msg = F_msg + (L / page_size) * mem_f + (iovec Iovec_f: 0) * L / iovec_size + (virtual Pf_f:0) * L / (page size). Response received F msg = F-msg - (L / page_size) * mem_f - (iovec Iovec_f: 0) * L / iovec_size - (virtual Pf_f: 0) * L / (page_size) - req f. Reply During transmission F msg = F_msg + (L / page_size) * mem_f + (iovec Iovec_f: 0) * L / iovec_size + (virtual Pf_f: 0) * L / (page_size) Transmission completed F msg = F_msg - (L / page_size) * mem_f Response received F msg = F-msg - (iovec Iovec_f: 0) * L / iovec_size (virtual Pf f: 0) * L / (page size With L: Message length in bytes, page_size: page size (e.g. power of 2), memf: parameter associated with memory access speed, per page, iovec : search for the presence or not of iovec, iovec_f: parameter defining the slowing down due to the reading of iovecs, iovec_size: parameter associated with the average size of an iovec segment (e.g. power of 2), virtual : search for the presence or not of a virtual addressing, [0557-BULL16] pf_f: parameter defining the slowing down due to page faults, advantageously pf_f depends on the time necessary to resolve a page fault as well as the probability of make a, page_size: the size of a page (eg a power of 2), and req_f: factor defines the slowdown to apply for a request. Thus, the network interface controller 100 according to the invention and the method for dynamic management of retransmission delay according to the invention make it possible to finely adapt the moment of retransmission of a message according to the characteristics of the message but 10 also other characteristics. In addition, this is possible in the absence of complex calculations and on the basis of frequency dividers and reduced frequency clock. All these advantages therefore contribute to reducing the risks of loss of messages while improving the performance of an interconnection network.
权利要求:
Claims (13) [1" id="c-fr-0001] claims 1. Network interface controller (100) for the dynamic management of a retransmission delay (20) of a message (300) within an interconnection network (1) comprising a plurality of nodes (200) , said network interface controller (100) being able to return a message (300) if a retransmission delay (20) of the message (300) is exceeded, said network interface controller (100) being characterized in that He understands : - a communication module (110) adapted to receive, from a source node (201), an instruction for transmitting a message to a target node, said instruction (211) for transmitting a message comprising data characteristics (350) of the message, at least one transmission buffer memory (120) capable of storing at least part of the characteristic data (350) of the message and able to associate it with a retransmission delay (20) of the message, a deceleration definition module (130), said deceleration definition module being able to define a value (30) of the division factor from said characteristic data (350) of the message, - a reference clock (40) capable of generating a fixed frequency signal (41), - at least one frequency divider (140), said frequency divider (140) being able to generate a signal at a reduced frequency (42) from the value (30) of the division factor and from the fixed frequency signal ( 41) coming from the reference clock (40), - at least one reduced frequency clock (150), said reduced frequency clock (150) being associated with the transmission buffer memory (120), said reduced frequency clock (150) being able to allow timing of the retransmission time (20) from the signal at a reduced frequency (42) and being capable of triggering a retransmission of the message (300) if the retransmission delay (20) is exceeded. [2" id="c-fr-0002] 2. Network interface controller (100) according to claim 1 characterized in that, when the message (300) has been transmitted by the network interface controller (100), the deceleration definition module (130) is configured to modify the value (30) of the division factor. [3" id="c-fr-0003] 3. network interface controller (100) according to one of claims 1 or 2, characterized in that it is capable of reading data to be transmitted directly from a memory of the source node [0557-BULL16] (201) and in that the deceleration definition module (130) is configured to take into account a reading time of the data to be transmitted during the definition of the value (30) of the division factor. [4" id="c-fr-0004] 4. Network interface controller (100) according to any one of claims 1 to 3, characterized in that it comprises at least eight transmission buffer memories (120) each being associated with a frequency divider (140) and to a reduced frequency clock (150). [5" id="c-fr-0005] 5. Network interface controller (100) according to any one of claims 1 to 4, characterized in that the characteristic data or data (350) used by the deceleration definition module (130) to define the value (30) of the division factor includes: - the size of the message, for example measured in bytes, - the presence of data to be transmitted in the instruction, - the presence of a zone start address to be read for the source node and / or the target node, - the presence of a descriptor table address comprising, when the data to be transmitted is in a split area, the start address and the length of each fragment on the source node and / or the target node, - the presence of virtual addressing for the target node, and / or - the subject of the message, whether it is a request or a response. [6" id="c-fr-0006] 6. network interface controller (100) according to any one of claims 1 to 5, characterized in that the characteristic data or data (350) used to define the value (30) of the division factor include the size of the message measured in bytes. [7" id="c-fr-0007] 7. Network interface controller (100) according to any one of claims 1 to 6, characterized in that the value (30) of the division factor is also calculated from characteristic values of the network. [8" id="c-fr-0008] 8. Network interface controller (100) according to any one of claims 1 to 7, characterized in that the value (30) of the division factor is further calculated from a constant, the value of said constant being such that a signal at a reduced frequency [0557-BULL16] (42) generated from the value of said constant and from the fixed frequency signal (41) coming from the reference clock (40), would correspond to time necessary for a message to go back and forth on the interconnection network (1) empty. [9" id="c-fr-0009] 9. Network interface controller (100) according to any one of claims 1 to 8, characterized in that it is configured so that, when the message (300) has been transmitted, the characteristic data of the message (300) are modified and a new division factor is calculated by the deceleration definition module (130), said new division factor being calculated from the old division factor and the new characteristic data of the transmitted message. [10" id="c-fr-0010] 10. Network interface controller (100) according to any one of claims 1 to 9, characterized in that the deceleration definition module (130) is configured to define a new value of the division factor when the acknowledgment of the message has been received by the network interface controller. [11" id="c-fr-0011] 11. Network interface controller (100) according to any one of claims 1 to 10, characterized in that it is configured so that if an instruction to transmit a new message is received by the communication module. communication (110), that said instruction includes data characteristic of the new message and that part of the data characteristic of the new message is recorded on the transmission buffer memory (120), then a new value of the division factor is calculated, said new value of the division factor being calculated, by the deceleration definition module (130), on the basis of the old division factor and said characteristic data of the new message. [12" id="c-fr-0012] 12. Network interface controller (100) according to any one of claims 1 to 11, characterized in that the retransmission time is calculated from the current time as given by the clock or clocks with reduced frequency and a constant waiting time. [13" id="c-fr-0013] 13. Method (500) for dynamic management, by a network interface controller (100), of a delay for retransmission of a message, within an interconnection network (1) comprising a plurality of nodes ( 200), so as to return a message (300) if a message retransmission time is exceeded, said network interface controller comprising a communication module (210), at least one transmission buffer memory (120), a module definition of deceleration (130), a reference clock (40), at least one frequency divider [0557-BULL16] (140) and at least one clock at reduced frequency (150), said method comprising the following steps: - reception (510) from a source node (201), by the communication module (110), of an instruction (211) for transmitting a message (300) to a target node (202), said instruction (211) for transmitting a message comprising characteristic data (350) of the message, - storage (520), in the transmission buffer memory (120), of at least part of the characteristic data (350) of the message and association of said part of the characteristic data of the message with a retransmission delay (20) of the message , - definition (530), by the deceleration definition module (130), of a value (30) of the division factor from said characteristic data (350) of the message, - generation (540), by the frequency divider (140), of a reduced frequency signal (42) from the value (30) of the division factor and of a fixed frequency signal (41) from the reference clock (40), - counting down (550), by the reduced frequency clock (150) associated with the transmission buffer memory (120), of the retransmission delay (20) at a frequency equal to the frequency of the reduced frequency signal (42), and - triggering (560) of a retransmission of the message (300) if the retransmission delay (20) is exceeded.
类似技术:
公开号 | 公开日 | 专利标题 EP3470982B1|2020-06-17|Method and device for dynamically managing the message retransmission delay on an interconnection network CN105282216B|2020-07-28|Method and system for keeping interest alive in a content-centric network FR2926939A1|2009-07-31|DATA TRANSMISSION METHOD WITH ACQUITTATION ANTICIPATION, INPUT DEVICE, COMPUTER PROGRAM PRODUCT, AND CORRESPONDING STORAGE MEDIUM EP2064853A1|2009-06-03|Method for optimising traffic control in a packet communication network CN103493449A|2014-01-01|Effective circuits in packet-switched networks FR2934447A1|2010-01-29|METHOD OF COMMUNICATING BETWEEN A PLURALITY OF NODES, THE NODES BEING ORGANIZED FOLLOWING A RING US8885653B2|2014-11-11|Protocol translation FR3025331A1|2016-03-04| US10075498B2|2018-09-11|Systems and methods for transmitting data in real time WO2016197822A1|2016-12-15|Packet sending method and device US10523532B1|2019-12-31|Multiple queueing for distributed environments EP2497235B1|2013-08-14|Diagnostic tool for broadband networks WO2015145382A1|2015-10-01|Electronic component with deterministic response US20150026123A1|2015-01-22|Size-based data synchronization EP3675435A1|2020-07-01|Method for dynamic routing in a network of connected objects FR2946164A1|2010-12-03|METHOD FOR DOWNLOADING LARGE DATA DATA TO A LARGE NUMBER OF NETWORKED NETWORK MACHINES FROM A SINGLE SERVER EP3657859A1|2020-05-27|Method for optimising the data exchange between connected objects by message type EP3709185A1|2020-09-16|Method for optimising data exchange in a connected object infrastructure EP2923461B1|2019-12-11|Device and method for retransmitting data in a network switch CN109660589B|2021-05-04|Request processing method and device and electronic equipment EP3726801A1|2020-10-21|Methods for dynamically controlling transmission control protocol push functionality and devices thereof EP3391598B1|2021-03-17|Terminal and method for transmitting data via a strained channel FR3089089A1|2020-05-29|Method for optimization by type of message of data exchange between connected objects FR3022094A1|2015-12-11|METHOD AND SYSTEM FOR CONTROLLING FLOW Kulkarni2012|Incast-free TCP for Data Center Networks
同族专利:
公开号 | 公开日 EP3470982B1|2020-06-17| US20190109796A1|2019-04-11| JP2019106697A|2019-06-27| EP3470982A1|2019-04-17| FR3072237B1|2019-10-25| US10601722B2|2020-03-24|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 EP0777363A2|1995-11-28|1997-06-04|NCR International, Inc.|Method for acknowledgement-based flow control| WO2004010289A2|2002-07-19|2004-01-29|Mediatrix Telecom Inc.|Distributed object-oriented messaging method and system| US20060039412A1|2004-08-12|2006-02-23|Infineon Technologies Ag|Method and device for compensating for runtime fluctuations of data packets| US6665308B1|1995-08-25|2003-12-16|Terayon Communication Systems, Inc.|Apparatus and method for equalization in distributed digital data transmission systems| US6246702B1|1998-08-19|2001-06-12|Path 1 Network Technologies, Inc.|Methods and apparatus for providing quality-of-service guarantees in computer networks| US20040208158A1|1998-08-19|2004-10-21|Fellman Ronald D.|Methods and apparatus for providing quality-of-service guarantees in computer networks| JP5598155B2|2010-08-12|2014-10-01|ソニー株式会社|Information processing apparatus and method, and transmission / reception system| WO2016148358A1|2015-03-16|2016-09-22|엘지전자|Method of fast-retransmitting uplink data in wireless communication system and apparatus therefor|CN109041246B|2015-02-10|2020-01-17|华为技术有限公司|Base station, user terminal and carrier scheduling indication method| US10684963B2|2018-12-28|2020-06-16|Intel Corporation|Fixed ethernet frame descriptor| US11196759B2|2019-06-26|2021-12-07|Microsoft Technology Licensing, Llc|SIEM system and methods for exfiltrating event data|
法律状态:
2018-10-25| PLFP| Fee payment|Year of fee payment: 2 | 2019-04-12| PLSC| Search report ready|Effective date: 20190412 | 2019-10-24| PLFP| Fee payment|Year of fee payment: 3 | 2020-10-27| PLFP| Fee payment|Year of fee payment: 4 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 FR1759459|2017-10-10| FR1759459A|FR3072237B1|2017-10-10|2017-10-10|METHOD AND DEVICE FOR DYNAMICALLY MANAGING THE MESSAGE RETRANSMISSION DELAY ON AN INTERCONNECTION NETWORK|FR1759459A| FR3072237B1|2017-10-10|2017-10-10|METHOD AND DEVICE FOR DYNAMICALLY MANAGING THE MESSAGE RETRANSMISSION DELAY ON AN INTERCONNECTION NETWORK| EP18198985.6A| EP3470982B1|2017-10-10|2018-10-05|Method and device for dynamically managing the message retransmission delay on an interconnection network| JP2018190591A| JP2019106697A|2017-10-10|2018-10-09|Method for dynamically managing message retransmission delay in interconnection network and device| US16/156,287| US10601722B2|2017-10-10|2018-10-10|Method and device for dynamically managing the message retransmission delay on an interconnection network| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|